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Abstract 


Dependence analysis is an important part of a restructuring compiler. Many tests 
arc proposed for dependence analysis. Most of them are not accurate. The accurate 
tests are more complicated. A simple but exact dependence test based on genetic 
algorithms was proposed earlier. But it was applicable to only problems with known 
bounds. This test is adapted to the dependence problem with unknown bounds. 
WHILE loops and certain DO loops with exit statements are generally treated 
sequential programs. The problem in parallelizing WHILE loops is, that iteration 
space is unknown. But some WHILE loops are parallelizable. Currently there is 
no test that can analyze dependence problem of WHILE loops at compile time. 
GAtest is extended to .solve some cases of dependence problem of WHILE loops. 
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Chapter 1 
Introduction 


The major objective in research and development of parallel processing system is 
to extract high performance for application that may require big number crunching 
capabilities or manipulation of huge data. There are two simple but key ideas which 
lead us to realize parallelism in computer architectures: 

1. Separating functionality: 

The processing of a single instruction is partitioned into several parts or func- 
tional units which are performed by different structural units of ALU. So at 
any time distinct parts of a number of instructions can be executed in parallel. 
The parallelism of this kind is called pipeline parallelism. 

2. Duplicating parts and separate application: 

Parts of a computer system are replicated. For example, if ALUs are duplicated 
we have super scalar architecture. Replication of CPUs give multiprocessor 
system. Similarly whole computer system can be replicated and all the systems 
interconnected by a network. Such an environment, can be homogeneous or 
heterogeneous, provides a PVM (Parallel Virtual Machine) like computing 
environment. Also application’s parts are separated to run on these replicated 
parts in parallel. 

It, clearly, indicates that several factors have to be considered in designing high 
performance computer system. The algorithms should be carefully designed to 
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match architectures and programming environments. 

The other important factor which impedes effective use of parallel computing 
system is software inertia. The investment in developing software for sequential 
computers is quite enormous. The manual re-engineering of these software to run on 
parallel system is not only infeasible but too expensive. Therefore what is needed is 
a program rc-structurer., which will take sequential programs as input and produce 
optimized equivalent flow correct parallel program, without or with least human 
intervention. In fact, parallel computers have remained in research and development 
stage for so long due to lack of efficient code restructuring technology. 

1.1 Restructuring compiler 

A restructuring compiler takes a sequential program as input ajid produces an equiv- 
alent parallel program for a target parallel machine as output by performing number 
of transformations to extract the parallelism. Most of the restructuring compil- 
ers have been developed as research tools and work on source code level. Besides 
standard compiler optimization techniques, viz., loop fusion, strip mining, loop in- 
terchange, new transformations have been developed specifically for restructuring 
sequential code for execution on a target parallel machines. A restructuring com- 
piler has to analyze the memory references to determine which transformations may 
be applied. The transformations must preserve the flow correctness of the original 
program. This can be achieved by ensuring that memory references are consistent 
by imposing dependence relationships, among references. The flow correct parallel 
programs will also be logically correct if the corresponding sequential program itself 
is logically correct. 

The execution order of statements in a sequential program preserves the depen- 
dence relations. However when a number of these statements have to be executed 
in parallel, the dependence relations among the memory references by the state- 
ments may be violated. In order to understand to what extent the dependence 
relations are violated and w'hat techniques should be applied to resolve or eliminate 
the dependencies it is essential to understand the concept of dependence relations. 
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1.2 Data dependence 


Let S and T be pair of statements selected arbitrarily from a sequential program. 
The set of variables written by a statement is called define set of that statement 
and similarly set of variables read is called use set of that statement. Statement T 
is dependent on S if there exist some memor}' location M, such that 

1. S and T both reference M and M belongs to define set of S or T or both, 

2. S precedes T in sequential execution of program, 

3. M is not written in between S finishes and T starts. 

Three types of dependencies exist based upon the references to M by S and T, 
namely 

• T is flow-dependent on S if S writes M and then T reads it, 

• T is anti- dependent on S if S reads M and then T writes it, 

• T is output-dependent on S if S writes M and then T writes it again. 

We use S T to denote flow dependence, S 5a T for anti-dependence and S 5o T 
for output dependence. 

Example 1.1: 

SI: a = b -1- c {SI, a} 5a {S2, a} - Output Dependency 

S2: a = b -f d {S2, a} 5/ {S3, a} - Flow Dependency 

S3: c = sqrt(b) -f a {Si, c} 5a {S3, c} - Anti Dependency 

I 

The above dependencies are explained at the granularity of statements for sim- 
plicity. In many applications, mainly in scientific problems, the loops constitutes a 
significant portion of code. An empirical estimate predicts the program control is 
localized in various loops for about 80% of the time. Therefore, to exploit the fine 
grain parallelism in a program the main target should be for loop parallelization. 
Loop parallelization means to restructure the code inside the loops in such a way 
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that several iterations of loop can be executed in parallel. However, loop restruc- 
turing is possible if there are no data dependences between iterations. For example, 
consider the following piece of code. 

a(0) = c 

for (i=l; i<N; i++) 
a(i) = a(i-l) + 2 

In first iteration value of a(l) is determined using the value of a(0). Similarly in 
second iteration value of a(2) is determined using a(l) and so on. This means 
unless the {i — l)th iteration of loop is complete zth iteration cannot start. The 
dependences like this, where the value generated by a previous iterations of loop 
is required for generating a value in a subsequent iteration is called loop carried 
dependences. All loop carried dependences have to be either eliminated or preserved 
before restructuring a loop for parallel execution. 

The problems associated with data dependences for array variables are funda- 
mental to the results of investigations presented here. These problems have, there- 
fore, been discussed in more details in the next chapter. There are many dependence 
tests [13, 2, 16, 8, 14] to solve the dependence problem. A general solution to de- 
pendence problem requires integer programming which takes exponential time. But 
st'veral tests suited for some particular class of problems were found. Most of these 
tests are simple and conservative. Omega test [14] is the exact test found so far. But 
in worst case it may take exponential time. None of the existing tests are suited for 
analyzing loops with non-constant increments. Similarly, barring omega test, none 
of the tests is applicable to solution of non-linear equation. In this thesis a special 
case of dependence problem involving non-constant loop increment, recurrence rela- 
tion, is solved using genetic algorithms. It is used for WHILE loop parallelization. 
The technique can be extended to the case of general for loops of C programs in a 
straight forward way. Though the solution takes exponential time in worst case, in 
practice it is found to be quite fast as indicated by the experimental results. 
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1.3 Overview of thesis 

The work done in iliis thesis can be divided into three parts. The first part is a com- 
prclu'nsive study of the various dependence tests and their performance evaluations. 
Around 50 real programs written by different users including students and faculty 
were chosen. Length of these programs range from 20 to 1700 lines. The results of 
the performance study are presented in Chapter 2. The second part is essentially 
enhancements to GAtest [12] which is presented in third chapter. The enhancement 
consists of the following. 

f 

• Testing for a particular direction vector. 

• Applicability of GAtest to unknown bounds. 

• A new concept of S^-spaces (See Section 3.8) is introduced. 

GAtest has been adapted for 5^-space. The third part is connected with the paral- 
lelization of WHILE loops. GAtest has been adapted to solve dependence problem 
in case of WHILE loops. 
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Chapter 2 

Dependence Tests 


A sequential program can be considered as a sequence of statements containing a 
single thread of control in execution. Most programs are deterministic in which 
any portion of thread contains a single entry and a single exit point. Loop is a 
programming construct which is associated with a portion of thread of control in 
runtime environment, in which control passes through the same path repeatedly for 
a finite number of times. 

Example 2 . 1 : Program 2.1. 

Sl:j = 0 

for i=l , 10 do Program thread : Si , (S2 , S3)^° , S4 

S2: a(i) = i Loop : (S2 , S3)'° 

S3: b(i) = i + 1 
endfor 

S4: print ’’end” 

I 

In the above example the notation 5" is used to denote that 5 is repeated n times. 

A parallel program can be considered as a set of statements with more than one 
thread of control in execution. The loops have major potential for preparing two or 
more threads of an equivalent parallel program from a corresponding single thread 
of the sequential program. This leads to parallelizing the loop. Parallel program 
should synchronize at the points, where dependence exist among threads. If there is 
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no dependence among threads, synchronization is needed only at the entry and exit 
points of the loop, resulting in parallel execution of different iterations of a loop. 
Example 2.2: Program 2.2. The equivalent parallel program of example Program 
2.1 is shown below 


SI: j = 0 

forall i=l , 10 do 
s2: a(i) = i 

s3: b(i) = i+1 

endf or 

s4: print "end" 


I 

The threads of execution of a loop can be represented graphically, called iteration 
space diagram. It is a N-dimensional plot, in which each dimension represents an 
index variable of a loop, each point represents an iteration in execution and an arrow 
between points indicates dependence between iterations. The dependence between 
two statements of different iterations of a loop is called loop carried dependence. 
A more detailed diagram can be drawn using N+1 dimensions, extra dimension 
indicating execution in a single iteration. In addition to loop carried dependences, 
this diagram also shows dependencies between statements of a single iteration called 
loop invariant dependencies. 

Example 2.3: Program 2.3 

for i=l , 4 do 

SI: a(i) = 2 + b(i) 

S2: b(i) = a(i-l) + a(i+l) 

endf or 


I 

Loop invariant dependencies cause no problem to loop parallelization. But loop 
carried dependencies have to be eliminated or must be preserved by transformations 
like loop splitting., introducing new variables, etc. The dependence problem of loops 
is formally defined in the following section. In case of dependence in loops certain 
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Figure 1: Iteration space diagram of Prog 2.3 

issues such as direction of the dependence and distance between dependent iterations 
come into picture. As mentioned in the Chapter 1 several tests have been proposed 
for the dependency problem of loops. A comprehensive but brief study of some well 
known data dependence tests have been presented in this chapter. 

2.1 Dependence problem of loops 

Let S and T be a pair of statements in a nest of loops. T is dependent on S if there 
exist inst ances S' and T' of S and T respectively in the unrolled loop such that T' 
is dependent on S'. 

Consider following two array references within a loop nest of depth n. 

for II = 1 to N1 do 

for 12 = 1 to N2 do 

for In = 1 to Nn do 

X(fl(Il,I2,...,In),f2(Il,I2,...,In),...,fn(Il,I2,...,In)) = ... 
... = X(gl(Il,I2,...,In),g2(Il,I2,...,In),...,gn(Il,I2,...,In)) 
endf or 


endf or 
endf or 
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If the above two references of X are referring to the same memory location for 
some iterations I = . . . ,td) and J = (ji,j 2 , ■ ■ - Jd) then 

fi(i) = gi(j) ' 

f2(I) = g2(J) 

; ( 
fii(I) = gn(J) 

and 1 < ik,jk ^ Nk, I < k < n. 

In order to determine T is dependent on S, it must be proved that at least a pair 
of I, J exists that satisfy the above equations. If all f i and gi are linear functions 
then each of the equations in the set of equations ( 1) is also a linear equation. In 
g(>ncral a linear equation in n variables with the variables being bounded can be 
represented as follows. 


+ ^ 2-^2 + * * 

■ + a^Ir 

i — <^0 

(2) 

h< 

h 

r 

VI 



I 2 < 

h 

VI 


( 3 ) 

L < 

In 

< Un 




Equation ( 2) is called linear diophantine equation. 


2.2 Direction vector 

Let S and T be two statements inside a loop nest and T is dependent on S. This 
implies that there are two iterations I = {ii, Ht- and J = • • • ijd} for 

which statements S and X refer to a common memory location. The direction vector 
of this dependence relation can be defined as 

D = {-siifO'i - ii),sig{j 2 - 22), • • -.sig^jd - id)] 
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where 

sig{j - 0 = -{ ' <' if i > i; 

/ >' if z < j. 

In the direction vector, if a component is ' — then it indicates a loop invariant 
dependence, with respect to the loop of that component. Similarly if a component 
is '<' or ' >', then it indicates loop carried dependence, with respect to the loop 
of that component. A component can have multiple symbols, if different instances 
of the two statements are having different type of dependencies. If for a particular 
component all directions exist then it can be replaced by a i.e. If (<, =) , (<, <) 
and (<, >) direction vectors are valid, these can be denoted by one direction vector 
(<,*)• 

2.3 Dependence distance 

The distance between the dependent iterations of a loop in terms of number of 
iterations is the dependence distance. 

If I = {ii,i 2 , ■ ■ ■ ,id) and J = (jijis) • • • > jd) are the iterations that are causing 
dependence then the distance vector is 

DD = {ji 

The distance vector may vary from one instance to another instance of the depen- 
dence of the same two statements. Dependence distances are required for vectoriza- 
tion of serial programs and converting serial do loops to doacross loops. 

2.4 Dependence tests 

Many tests were proposed to solve the dependence problem. Unfortunately, none of 
these tests can handle all the cases that arise in a dependence problem. Some tests 
are best suited to a particular class of problems. Most of the tests are conservative 
in nature in the sense that they predict dependency even in the cases where it may 
not exist at all. Some of the important tests are studied in this section. 
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2.4.1 GCD test 

GCD test is the simplest test of all the dependence tests. It is based on elementary 
theorem of number theory stating that the Equation ( 2) has an integer solution 
if and only if d = 5c‘d(<3i , oj, . . . , a„) divides Uq. This is extended by combining 
with Gaussian elimination method, to be applicable to a set of linear diophantine 
equations [13]. 

The system of linear diophantine equations can be represented in the matrix 
form 

XA = a 

GCD test finds a unimodular matrix U and echelon matrix D such that UA = D. 
Algorithm to find unimodular matrix is given below. 

Algorithm 2.1: GGCD. 

Generalised_GCD(CoefMatrix) { 
for (i=l; i<=NoColums; i++) { 

From row i+1 to all rows make all elements in column i 
zeros by applying elementary row transformations. 

Apply the same elementary row transformations to the 
unit matrix. 

} 

return (unimodular matrix = result of transforming unit matrix) 

} 

I 

If an integer matrix T exists satisfying TD = C then X = TU is a solution to 
the system. Since D is an echelon matrix the equation TD = C can be solved easily. 
The drawback of this test is it doesn’t consider the bounds at all. So if the ged of 
the coefficients is 1 then this test always predicts dependence, even though there is 
no integer solution within the given range of indices. 
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Example 2.4: The following example is reproduced from [13]. Consider the follow- 
ing equations. 

Zii - 272 - 2j2 = 2 


7l — 272 = 1 

Coefficient matrix A is 

^3 1 ^ 

-2 -2 

1-2 0 j 

Applying above algorithm we get unimodular matrix 


D = 


f: 


\0 


1 \ 
-2 
0 


The equation TD = C becomes 


( h ^2 h ) 


/l 1 \ 

0-2 =(21) 

U 0 ; 

This gives ti = 2 and — 2^2 = 1- There is no integer solution to Therefore 
the system of equations has no solution. 


2.4.2 Banerjee’s test 


Banerjee’s test [13] is based on intermediate value theorem stating that, the linear 
diophantine equation ( 2) has a real solution within the bounds ( 3), if blow < oq < 
bup, where blow b^p are lower and upper bounds of expression respectively, 

within the region bounded by ( 3). This test computes the bounds of the linear 
expression s^nd if cq is within these bounds it predicts dependency. The 

bounds of a linear expression in a rectangular space can be calculated as follows. 
Lower bound blow occurs when 


Ii = 


U if o,i ^ 0 

Ui if a,- < 0 
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and u])pcr bound 6„p occurs when 


li = 


Ui if a,- > 0 
/, if a,- < 0. 


wlicre /, , n, are bounds of variable 7,-. 

In case of trapezoidal regions bounds can be calculated from the algorithm given 
below. 

Algorithm 2.2: Banerjce. 


Banerjee(P , Q , Dioph) 

P , Q : Lower and upper bounds 
Dioph : Diophantine equation 


Initialize. 
blow = 0 
bup = 0 
k = n 
D = Dioph 
E = Dioph 

} 

while (k >= 1) ■{ 

Eliminate Ik { 

if (D[k] >= 0) blow = blow + D[k] * P[k] [0] 
else blow = blow + D[k] ♦ Q[k] [O] 


if (E[k] >= 0) bup = bup + E[k] * Q[k][0] 
else bup = bup + E[k] * P[k] [0] 


if (k > 1) { 

for (i=l; i<k; i++) { 

if (D[k3 >= 0) D[i] = D[i] + D[k] * P[k][i] 
else D[i] = D[i] + DCk] * Q[k][i] 
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if (E[k] >= 0) E[i] = ECi] + ECk] * Q[k][i] 
else E[i3 = E[i] + E[k] * P[k] [i] 

} 

} 

} 

} 

return (blow , bup) 

} 

I 

This test is conservative in nature. Even though it indicates real solutions within 
the bounds there may not exist int<'gcr solutions. 

Example 2.5: Consider the bounds of 4a; + Zy in the region 

1 < X < 3 

y <5 - X 

Lower and upper bounds of 4x + 3y from the above algorithm are 4 and 18 
respectively. But 18 occurs at point (3 , 2) which will not be in the region. 

I 

From the above example it is known that in case of trapezoidal region bounds 
calculated from the above algorithm need not be within the region. 

2.4.3 I-test 

I-test [15] extends both the applicability and accuracy of the GCD and Banerjee’s 
tests. Like GCD test it checks for the presence of integer solutions and like Banerjee’s 
test it also considers the bounds of the variables, even if bounds of sufficiently 
many variables are unknown. It often produces definitive positive than GCD and 
Banerjee’s tests. I-test changes the diophantine equation into an interval equation. 

Gl/l -h 0212 + • • • + Onlfi = [L, U] = [a/jflu] (4) 

Initially a; = a„ = oq. 

Interval GCD test: Let Cj, 02 , . . . ,a„ be integers, and let d = ffcd(ai,a 2 , ■ . . ,a,J, 
the interval equation ( 4) has an integer solution if and only if L < dfL/d] < U . 
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If the given diophantine equation satisfies the interval GCD test, some i such 
that |a,| < — 0 ; + 1 , 1 < f < n is chosen and by eliminating that variable new 

interval is formed as follows. 

a'l = ai — 7nax(aiXi) 

a[ = Qti — min(aiXi) 

This elimination of variables is repeated until further elimination of variables is 
not possible or interval GCD test fails. 

Algorithm 2.3: I-test 

Itest(Dioph , Constraints) { 

L = aO 
U = aO 

coeff = {al , a2 , . . . , an> 

unknown = {all ai's whose bounds are unknown} 

if (unknown != NULL) { 
u = ged (unknown) 
if (u == 1) return(TRUE) 

> 

coeff = (coeff - unknown) + u 
while (TRUE) { 

while (There is ai whose limits are known 
and abs(ai)<=U-L+l) { 

Select ai such that its limits are known 
and abs(ai)<=U-L+l 
L = L - max(ai*xi) 

U = U - min(ai*xi) 
coeff = coeff - ai 
if (coeff == NULL) { 

if (L <= 0 and 0 <= U) 
return (TRUE) 
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else return(FALSE) 


} 

} 

d = gcd(coeff) 

if (!(L <= d ceil(L/d) <= U)) return(FALSE) 
if (d != 1) { 

for all ai in coeff ai = ai / d 
L = ceil(L/d) 

U = floor(U/d) 

if (L > U) return(FALSE) 

} 

else return (maybe) 

} 

} 

I 

Example 2.6: This is reproduced from [15]. Consider the equation 

1 1 — 3/2 + 7/3 = 8 

.subj<'c.t to the limits 

1 < /i < 3, 1 < /2 < 2, 1 < /a < 4. 

From CCD test gcd(I , -3 , 7) = 1 divides 8. From Banerjee’s test limits are (2 , 
28). 8 is within the limits so both CCD and Banerjee’s tests assumes dependence. 
If we rewrite the equation as interval equation, 

/i-3/2 + 7/3 = [8, 8] 

and eliminate /j we get 

-3/2 + 773 = [8-3,8- 1] = [5,7]. 

Interval CCD test indicates there may be a solution. Now we can eliminate / 2 , 
which gives 

7/3 =[5 + 3,7 + 6] = [8,13] 
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Final application of interval GCD test reveals that there is no solution. 

I 

2.4.4 Delta test 

Delta test [2] is a multiple subscript exact test to be used with common coupled 
subscripts. All the subscripts of the two array references are partitioned into minimal 
coupled groups. For each group of subscripts this test is applied. If any group does 
not have a solution then the whole system has no solution. The main idea behind 
the delta test is, constraints derived from the SIV subscripts will be propagated into 
other subscripts in t.he same coupled group. Delta test says independence if any of 
the component ZIV or SIV tests determine independence. Otherwise it converts all 
SIV subscripts into constraints and propagates them into the MIV subscripts. This 
is repeated until no new constraints are found. Then constraints propagated for 
coupled RDIV (Restricted Double Index Variable) subscripts [2]. Finally remaining 
MIV subscripts are tested. The results are intersected wdth existing constraints. If 
the result of intersection is the empty set then no dependence exists. 

Algorithm 2.4: Delta 

DeltaTest (Subscripts) { 

Initialize elements of constraint vector CV = NULL 
while (untested SIV subscripts exists) { 

apply SIV test to all untested SIV subscripts 

return (FALSE) or 

derive new constraint vector NCV 

NCV = Intersection(NCV , CV) 

if (NCV == NULL) return (FALSE) 

else if (CV != NCV) { 

CV = NCV 

propagate constraint CV into MIV subscripts 
possibly creating new SIV or ZIV subscripts 
apply ZIV test to untested ZIV subscripts 
return (FALSE) or continue 
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} 


} 

while (untested RDIV subscripts exists) 
test and propagate RDIV constraints 

test remaining MIV subscripts 

intersect resulting direction vectors with CV 

return (FALSE) or direction vector 

} 

I 

Example 2.7: Consider the following example. 

DO 10 i 
DO 10 j 

10 A(i+1 , i+j) = A(i , i+j) 

ENDDO 

ENDDO 

I 

Applying the strong SIV test to first subscript of array A derives the constraint 
(i + 1 = i') and distance of 1. Propagating this constraint into second subscript 
{i j = i' + j') to eliminate i* gives -f 1). Once again applying SIV test to 
the resulting subscripts gives distance vector of -1. Merging these to distances gives 
distance vector: (1 , -1). 

2.4.5 Lambda test 

The A test is an approximate test which extends Banerjee’s test for multidimensional 
array references. Consider the set of equations ( 1) of Section 2.1. If the equations 
are solved independently all may have integer solutions. However, simultaneously 
when the set is treated as a set of equations there may not be any integer solutions. 
That is the intersection of solution sets may be null. A test solves the problem by 
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formulating it as a less exact problem, 

A, {Ml) -MJ)) -!■■■ + KiMI] - gAJ)) = o. 

If a integer tuple (Aj, A2, . . . , An) not satisfying the above equation is found , then 
there is no integer solution to the system of linear equations. 

Example 2.8: Consider the loop nest given below 

DO I = 1 , 50 
DO J = 2 , 50 

ACI , I+l) = A(3*I+j+l , I+j) 

ENDDO 

ENDDO 

To test for the flow dependence, linear equations to be solved are 


Xi — 3X3 — X4 = 1 


Xi — X3 — X4 = —1 

with the constraints 1 < Xi,X3 < 50 and 2 < X4 < 50 . If (-1 , 1) is chosen as the A 
tuple we have 

-(xi - 8x3 - X4) + (xi - X3 - X4) = -2 

The above equation implies X3 = —2 which is violating the constraint 1 < X3 < 50 . 
So there is no dependence. 

I 

2.4.6 Power test 

Power test [8] works in two phases. It uses the extended GCD algorithm in the first 
phase. In addition to whether there are any integer solutions for the dependence 
equations without considering the loop limits, the extended GCD algorithm indicates 
formulas that can be used to specify the index variables in terms of free variables 
derived from the matrix equation 


X = TU, 
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where U is the unimodular matrix resulted from the GCD test. Power test constructs 
lower and upper bounds for each of the free variables of the above matrix equation 
using loop bounds. These give boundaries to the solution space. Fourier- Moizkin 
variable elimination method, modified to find integer solutions, is used to test for 
an empty solution space. If the solution space is not empty then there exists integer 
solution. 

Algorithm 2.5: Power 

PowerTest(Coef Matrix , Constraints) { 

Apply Generalized GCD test 

Solve the Matrix equation X = TU to 

give index variables in terms of free variables. 

Substitute these index variables into 

constraints which gives constraints of free variables. 
Validate the resulting solution space using 
Fourier-Motzkin variable elimination method, 
return (FALSE) or direction vector 

> 

I 

Fourier-Motzkin variable elimination method is given in Subsection 2.4.7. 
Example 2.9: This example is reproduced from [8]. Consider the following program 
segment. 

for I = 1 to N do 

for J = I + 1 to N do 
ACi] = A[j] 
endf or 
endf or 

The dependence equation is ii — = 0, the dependence matrix equation XA = C 

is. 
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(^i ii *2 i2) 


= ( 0 ) 


1 
0 
0 

-1 

Extended GCD algorithm gives, 

t’l = i) 

Ji = h 

*2 = U 

32 = t2 

From the loop limits we derive the following bounds on the free variables: 

1 < t2 <N 
1< <3 <t2-l,N 
t2 + l< t4 <N 

Since these are consistent there exists integer solution. Now suppose we want to test 
for particular dependence direction, such as (<) direction in the first loop. From 
ii < ji we derive the additional bound 

^2 -f 1 ^ ts- 

This is inconsistent with the previous bounds, so there is no dependence with (<) 
direction in the first loop. 

I 

2.4.7 Omega test 

Omega test is an exact dependence test. Input to the omega test is a set of linear 
equalities = c) and set of linear inequalities ^ c). Omega test first 

eliminates all equality constraints producing a new problem of inequality constraints 
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that has integer solution if and only if the original problem had integer solutions. 
Then it checks for contradictory pair of inequalities. If such pair is found then no 
solution exists. Tight inequalities arc converted to equalities and are eliminated 
as in the original problem. Redundant constraints are eliminated by tightening 
them. If the problem involves at most one variable and passed above tests then 
integer solution is reported. Otherwise Fourier-Motzkin variable elimination method 
is used to validate the inequalities. This method eliminates a variable from the linear 
programming problem. This projects a n dimensional object into n — 1 dimensional 
shadow. The shadow that came through the object of at least depth 1 is called a 
dark shadow. Algorithm to find shadows and validating the inequalities is given 
below. Curious reader can refer [14] for more details. 

Algorithm 2.6; Omega 

OmegaCequalities , inequalities) { 
do { 

Normalize the problem. If error return fail to callee. 

Determine unit variable or least co-efficient variable . 

Eliminate unbounded variables' inequalities, recursively. 

Eliminate redundant constraints and add a new constraint . 

If inconsistencies found, return fail to callee. 

} 

while (there are equalities) 
Fourier_Mot2kin(resulting_inequalities) 

} 

Fourier_Motzkin(inequalities) { 

Select a least coefficient variable. 

Calculate real shadow 
Calculate dark shadow 
If (projection is exact) 

call omega on this shadow, 
else { 
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If (there are no integer solutions to the real shadow) 
return (FALSE) 

If (there are integer solutions to the dark shadow) 
return (TRUE) 

Apply integer programming. 

} 

} 

Find_Shadows() { 

z = least_coeff icient_variable of problem 
Classify all inequalities into two classes { 

I. Lower bounds on z. /* of type el <=a*z 

II. Upper bounds on z. /* of type a*z >=e2 

} 

Calculate light shadow as follows ■{ 

copy main problem into subproblem PI, 

Eliminate all inequalities, containing z. 
for (each (i , j)) { 

/* i in class I and j in class II */ 

/* ith inequality: el<=a*z */ 

/* jth inequality: e2>=b*z */ 

new_inequality : b*el <=a*e2; 

Insert new_ inequality into PI . 

} 

> 

Calculate dark shadow as follows { 

copy main problem into subproblem P2, 

Eliminate all inequalities, containing z. 
for (each (i , j)) { 

/* i in class I and j in class II */ 
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/* ith inequality: el<=a*z */ 

/* jth inequality: e2>=b*2 */ 

new_ inequality : a*e2-b*el>=(a-l) (b-1) 

Insert new_inequality into P2. 

} 

} 

} 

I 

2.5 Performance results 

Tiny tool developed by Wolfe (9] is used to study the performance of various tests. 
Around 50 programs written by different users are studied. These programs are 
converted to tiny format, to make them acceptable by the tiny tool. This converter, 
J2i- converter, is also an integrated part of the tiny tool. Converted fortran programs 
are not acceptable to tiny because of 1/ O statements and GOTO statements and 
some others. So these programs are again filtered to produce programs containing 
only do, if and assignment statements. 

Suresh [11] has also done the analysis of various tests using test suites of programs 
containing a total of around 200 loops. The following tables are summarized results 
of present study and the work presented by Suresh. The results of the study are 
tabulated below. Test suites named NASA NSA and Convex are used by Suresh 
and My-test suite is used for the present performance study of the tests. 

From the Table 2 it is observed that many of the array references(73.35%) are 
linear. The non-linear references are mostly resulted from one single program (count 
6201) in case of My-test suite. Surprisingly most of the coefficients of the variables 
in subscript expressions are zeros (74.89%) and none of the references is containing 
coefficients other than 0 and dhl in My-test suite. In other suites also number 
of non-unit and non-zero coeffiecients is negligible. Many programs are using one 
dimensional arrays(92.96%). No program is found having more than 2-dimensional 
arrays in both Convex and My-test suites. 
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( oniparalivc performance of the various tests is given in Table 3. Among all 
the tests only Omega test is performed well. All other tests are failed due to the 
symbolic constants passed to subroutines. Banerjcc’s test by taking much less time 
p(‘rformed well compared to other tests, excluding Omega test. It is understandable 
from the presentation of Power test, even though generally it performs well, here it 
failed due to the unknown variables in the bounds. Power test is able to identify 
more independent than generalized GCD test alone. Time required for the Omega 
test is around 18 times that of Banerjee’s test. 

From the results of the experiments it can be concluded that a restructuring 
compiler needs a test that efficiently tests the dependence problem involving linear 
coefficients and particularly unknown bounds. 


Table 1: Coefficients of subscripts expressions 


Test suite 

0 

±1 

Others 

My-test 

123391 

45085 

0 

NASA NSA 

49784 

12468 

12 

Convex 

1123 

873 

10 

Percentage 

74.89% 

25.10% 

0.01% 


Table 2: Types of subscripts 


lest suite 

Linear 

Non-Linear 

My-test 

17113 

7051 

NASA NSA 

3130 

430 

Convex 

409 

24 

Percentage 

73.35% 

26.65% 
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Table 3: Frequency of array dimensions 


Test suite 

Dim-1 

L . 

Dim-2 

Dim-3 

Dim-^ 

My- test 

wmm 


0 

0 

NASA NSA 



1951 

462 

Convex 

400 

34 

0 

0 

Percentage 

92.96% 
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Table 4: Performance analysis of dependence tests 


Test 

Applied 

Independence 

Time in ms 

Banerjee 

1437 

31 

5940 

GGcd 

1437 

19 

5240 

A 

1437 

21 

94360 

Power 

1437 

37 

10960 

Delta 

1437 

35 

32340 

Omega 

1409 

1365 

107760 
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Chapter 3 
GA Test 


Holland proposed genetic algorithms in early 1970s as guided randomized search 
procedures that mimic the evolutionary process in nature. Genetic algorithms ma- 
nipulate a population of potential solutions to a problem. They operate on encoded 
representations of the solutions equivalent to the genetic material of the individuals 
in the nature. Each solution is associated with a fitness value that indicates its 
goodness of fit. The various genetic operators e.g. crossover cause the information 
exchange among the members of the population. These genetic operators coupled 
with the survival of the fittest strategy are employed to locate better solutions and 
ultimately to solve the problem. 

The feasibility of application of genetic algorithms for testing dependence was 
first proposed by Sudheer [3]. The solution as outlined there could provide only 'yes' 
or 'no' answer to the question of dependence between pairs of array references. Later 
Singhai [12] improved and extended it to determine direction vector for restructuring 
of programs. There are some other important aspects which are not addressed in 
any of the above work [3, 12]. These include, 

• Applicability of GAtest for unknown bounds. 

Some loops involve symbolic constants or variables that may depend on input 
values as bounds. In both the cases, one or both the bounds of the correspond- 
ing loop index are unknown. The solution for unknown bounds is presented 
in Section 3.4. 
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• Dependence direction for a trapezoidal search space. 

Some restructuring techniques like privatization of variables need the direction 
V('ctor. Singhai [12] addressed this problem for the rectangular bounds. It is 
not possil>le to extend his solution for the trapezoidal space in a trivial way. 
In Section 3.5.2 the problem of dependence direction for the trapezoidal space 
is considered and a solution is proposed. 

• Applicability of GAtest for S'^-search spaces. 

A new concept called 5^-scarch spaces is introduced in Section 3.7. This 
problem i.s discussed in detail in Section 3.8. 

3.1 GA approach to dependence problem 

Recall the linear diophantine equation ( 2). Dependence problem, in view of the 
genetic algorithms, can be framed as a search problem that finds an integer point 
(tj , X 2 , . . . , ajn), called optimal point, in the search space defined by the loop bounds, 
that satisfies the linear diophantine equation. 

Depending on the complexity of the constraints, the search space can be classified 
as rcciangxtlar^ trapezoidal^ unknown bounds or S'^-search space. These classes are 
not mutually exclusive. One class may fully or partially overlap another. 

If a fitness value - oo is associated with every integer point in the search 

space, then the problem is to obtain one of the fittest points, if one exists, in the 
search space. 

The general approach of GAtest to solve the dependence problem, is as follows. 

1. Apply Banerjee’s test. If dependence is ruled out, there is no integer solution. 

2. Use genetic algorithms to obtain a semi-integer solution xp, where xp = 
{xpi,xp2,.-.,xpn)- 

3. If the semi integer solution obtained in step 2 has all integer elements, xpi S 
N (1 < z < n), then xp is the solution vector. Otherwise divide the search 
space into two and apply GAtest to both resulting spaces. 
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Following sections explain these steps in detail for each of the special cases of the 
I)rol>leni dept'iiding on the complexity of the constraints. 


3.2 GAtest for rectangular space 

Among all the classes of spaces simplest one is rectangular space with all known 
bounds. All the bounds are constants and do not involve any expressions or unknown 
variables. The bounds of this space can be represented as in equation ( 3). 

Often lianerjec’s test concludes dependence even though there is no integer so- 
lution. It was proved in [12] that Banerjee’s test provides necessary and sufficient 
conditions for the existence of a semi-integer solution. 

If Banerjee’s test fails to rule out the existence of integer solution, 


3X {X = .,i„ } / Vx,- € R and Y^{aiXi) = Cq}. 

By applying genetic algorithm we can find a vector X such that 
Vi A: X,' € N,xt € R and ^ (a,x,) = oq 


Sometimes it is possible that Xk may also belong to N. In that case integer 
solution is found. Otherwise the search space is divided into two subspaces whose 
lower and upper bounds are, 


//j {^1 ) ^2? • • • 1 • • • ) ^n} 

Ui = {ui,U2,. .. ,Uk - [Xjt j , . . . , Un} 


and 


•^2 — ■{ ) ^2 ? • • • 5 )cl 1 • • • ) } 

U 2 = {ui,U2,...,Ufc,...,Un} 
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Algorithm 3.1: 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 
11 
12 

13 

14 

15 


GAtest(l, u, a) { 

if (Banerjee_test(l , u, a) == TRUE) { 

X = get_semi_integer_solution(l, u, a); 
if (all xi are integers) 
return (TRUE) ; 
else { 

let xi is the non integer element of x 
tl = 1; 
tu = u; 

tli = Ceil(xi) ; 
tui = floor (xi) ; 

return ( GAtest(l, tu, a) II GAtest(tl, u, a)); 

} 

} 

else return(FALSE) ; 


I 



Figure 2: Division of two dimensional search space 
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Ill case if X is the only real solution within the search space then after applying 
to each of the suhspaces Banerjee’s test will rule out the possibility of a solution. 
This rlivide and conquer method is applied recursively until an integer solution is 
found or no suhspace contains a solution. 

Example 3.1: Consider the following linear diophantine equation, 

3.Yj ■+■ 5A^2 ~ 10 

and the constraints are 

1 < < 10 
1 < X 2 <10 

Bancrjet'’s tost gives lower bound = 8 and upper bound = 35. So it predicts depen- 
dence. Application of GA resulted in semi-integer solution (1, 1.4). At this point 
GAtest divided the region {[1, 10], [1, 10]} into two subregions {[1, 10], [1, 1]} and 
{[1, 10], [2, 10]}. Banerjee’s test again predicts dependence for the first subregion. 
Semi-integer solution (1.667, 1) is found using GA. It is divided further into {[1, 1], 
[1, 1]} and {[2, 10], [1, 1]}. Now Banerjee’s test finds there is no solution to the given 
linear diophantine equation in both the subregions resulted at level 2. Application 
of Banerjee’s test to the second subregion, at level 1, finds that there is no solution. 
So there is no integer solution to the given problem. 

I 

3.3 GAtest for trapezoidal search spaces 

In trapezoidal search space, bounds of an index variable are linear expressions of the 
outer loops index variables. The general form of the constraints on the loop indices 
for this space can be represented as follows. 

/ll < Xl < Mil 

hi + hi^i < ^2 ^ *^21 "f '^22^1 

hi d" ^n2^1 d" ■ ■ ■ d" hn.^n—1 — "^nl d" Un2^1 d" ‘ ' d" ’^nn^n—l 
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C5A approach to handle trapezoidal space is to first apply Banerjee’s test. If the 
existence of an integer solution is not ruled out then a circumscribed rectangular 
spac<“ of the trapezoidal space is determined by adjusting the bounds to rectangular 
form, hollowing which a semi-integer solution is found similar to the rectangular 
spaces using (lAtest for rt'ctangular space. However, an integer solution for the cir- 
cumscribed space may not satisfy the original constraints for trapezoidal bounds. So 
the feasibility of the solution must be checked against the original constraints. When 
the solution is a real solution the space must be partitioned into two. Unlike the 
rectangular spaces, a variable which contains other variables in its bound expression 
cannot be partitioned. So a check must be done sequentially from the beginning to 
locate a variable along which a division is possible. A variable should have more 
than one value to be divisible. If a variable can have only one value, then, it can be 
eliminated by substituting appropriate value in other variable’s bound expressions 
as well as in linear diophantine equation. Though such an approach appears to be 
costly, in practice it is cheap. There will not be many variables and many of the 
subspaces will be eliminated by the Banerjee’s test initially. So this is applicable. 

3.4 GAtest for unknown bounds 

111 some cases the loop bounds can be symbolic constants. This may be true for 
both r<TtanguIar and trapezoidal spaces. Sometimes only one bound, either the 
lo\v<'r bound or tlu' upper bound, is unknown. At times both the bounds may be 
unknown. A variable whose both bounds are unknown is called free variable. In 
the previous chapter it was observed that many loops in subroutines are containing 
unknown bounds, because the bounds are containing some formal variable which is 
known only at the run time. In such cases dependence test must be able to decide 
whether the dependency exists with the known constraints. These unknown bounds 
can be assumed to be some finite value, like maximum value of the index of an array 
declaration for upper bounds and minimum value of the index for lower bounds. 
This is only an estimation of an unknown value, so this is not accurate enough. 

GAtest assumes these unknown bounds to be, theoretically, -oo for lower bounds 
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and 4cx; for uppor bounds. Aftor assuming this Banerjee’s test can be applied. If 
a free variabh* exists it is unnecessary to apply Banerjee’s test, because it is known 
that the lower and upper bounds of the expression range from -oo to oo. 

Kven in other rases also it may happen that the bounds of the expression may range 
from — to cc. It is known that if there are two free variables and their coefficients 
are relatively ])rim(', then there always exists an integer solution. 

If the dependence is not ruled out by the Banerjee’s test, GA is applied to get 
.semi -integer solution. In the application of GA, every time we assign a value to 
a variabh' w<* should check that the constraints of that variable, if these exist, are 
satisfied. While initializing the population, a free variable is assigned 0, which is 
the mid value of the limits assumed. If one of the bounds is known then it can 
be initialized with some raiulom number that satisfies the known bound. If both 
bounds art' known it can be initialized normally by a random value between lower 
and upper bounds. Algorithm to find the semi-integer solution is given below. 
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Algorithm 3.2; 


pger_solution{l, u, a) 

1 , u : Lowftr and upju'r bounds 
a : linear diophantine equation. 

{ 

init_poulation(l , u) ; 
while (l) { 

apply_genetic_operator () ; 

X » select _a_ineinber() ; 
j * random_nuniber_in_range(l, n) ; 
xj* (aO - sum(ii= j, ai xi))/aj; 

if (both bounds are unknown) 

return (semi_integer_solution = x) 
else if (satisfying_known_bounds(xi)) 
return(x) ; 
else { 

if (lj_known && xj < Ij) 

XJ ■ Ij; 

else /* uj known and xj > uj */ 
xj * uj; 

replace_worst_f it .member _by(x) ; 

} 

} 

} 
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init_poulation(l , u) 

1 , u : Lower and upper bounds 

{ 

for (i»0; Kpopulat lon.size; i++) { 

for (j=0; j<number_of .variables; j++) { 
if (both bounds known) 

pop(i]. var[j] ® random.num.in.rangedj , uj); 
else { 

if (lower bound known) 

pop[i]. var[j] * random_number_above_or_equal(lj) ; 
else if (upper bound known) 

pop[i3 . varEj] ® random_nuinber_below_or_equal(uj) ; 
else 

popEi]. varEj] = 0 /* Best estimation */ 

} 

} 

} 

} 


I 

l-'or all t}i«‘ following lonunas and 'rheorein, the dependence problem described 
by eipiatiouH ( 2, 3} is assumed. We are proving that if there is only real solutions 
<*xist and there is no integer solution to the problem then GAtest detects it. 
Lemma 3.1: Lei in the set of equations ( 3) some Ik is unknown. If there is no 
integer solution to the problem then Xk is hounded on the left side. 

Proof: 

Let the limits of are Ciow and Cup- 

There are two different cases depending on the sign of Ofc. 

Case J: g* > 0 

Lower limit of = ”0° 

Upper limit of 

Linear diophantine equation has real solution so gq < Bup- 
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Kvrnlnally Ihr uyMm divi.i,.,! will n's»»lt in a left subregion in which upper limit 
of J'K i^ 


»!■ < !• 


{' 

flu " ' 


q-1 


iiuw in th<* h'ft Mibregitui 

S,. li-fl t,<. .■liniili.'-d' Now the renmining right subregir 

has al! known bounds. 

(. 'ast *J: (lie < {) 

Similar to the fVj.s* / 

/),„ = c,„ + 

II., = +“■ 

When upper limit of Xk in the left subregior* reaches a value 

oo-^l _ 1 
ul < — ' 

* ' O-k 


Hu.u. = Auu- + 

c .1 1 n « • , * ,, ..milting a problem in the remaining right 

So the left subregion can be ehnunated, r<>>in « ' 

subregion with known bounds. 

I , , , , - , . . rtinriinatcd. And hence the i* is bounded. 

In both the cases a left subregion js clnnii 

I 

» . / <i) some Uk is unknown. If there is no 

Lemma 3.2: Let in the set of equations ( 

• Unn'nded on the right side. 

tniegcT solution to the problem then Xk 
Proof: 

j r Similar to the Lemma 3.1, this also 

Let the limits of are Chw 

has two cases depending on sign of a*. _ , „ ^ i oa-c,„„. i , , 

.-V.+ dbrecion reaches h > L J + 


If Ok > 0. If the lower bound in the right subregion reac es ^ . 

Qrv the risht subregion can be eliniinated. 

then the lower bound B/ou< = C/ow+fjt > ^ , oo-Cup i i i • +v, - 1,4 

. limit / > + 1, in the right 

Similarly if Ok < 0, then if xk readies a lo^ 
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fliiiiituitt'ti.u’sulliiig in a problem in the left subregion of 
kiumn IuhiiuIn. llriii e llie huimia. 

I 

Leiiuna 3.3: Itt Imlh houutis of vurtablt m tht ini of ((luaiions ( 3) arr vnknown. 

If thin t> tio .'•ohitioii to tin probltm ddfcts it. 

Proof: 

Will'll the Variable is tlivitieii at .some random point it re.sults in two subregions, 

• a left subiegioii with utikiiown lower bound and known upper bound, 

• and a right .subn-gion with known lower bound and unknown upper bound. 

Thi'se both < a.se.s are same as that are proved in Lemma 3.1 and Lemma 3.2. This 
results in the regions witli known bounds. Hence from the correctness of GAtest [12] 
for known bounds, (;Ate.st detects that there is no solution exists. 

I 

Theorem 3.1 (Exactness of GAtest) Ld the linear diophantine equation be Equa- 
tion ( 2} and bounds art ( 3} and some l,s and u.s are unknown. If the diophantine 
equation has no solution then (lAtesi pwves that there is no solution. 

Proof: 


( «.S< J: .No iiitegi'r solution exists for the diophantine equation in any region. 

GCU) lest detects it. d = gcd(ai,a 2 ,-.- i^n) does not divide co- 
Ca.se 2: If in the given region only real solution exists. 

Some bounds of some variables unknown. From the division of the region when a 
variable with unknown bounds is divided it is known that it results in two subregions 
with .same or les.s number of unknown bounds. Eventually this results in a case where 
only one variable has unknown bounds. For this case correctness of GAtest is proved 

in Lemma 3.3. 

In the worst case if the same variable is divided always then one subregion 
maj' always remain with equal number of unknown variables as that of the original 

problem. 
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lint in afl<*i roarhing a hmiiul value, that a computer can represent, it is 

1 H> mure pe.ssihle to divide tin* sajne variable again. So this will result in division of 
the next possible vaiiable. S<» eventually this results in a problem with less number 
of unknown v.ilue.s than theoiiginal problem. 

(’oir« » ttiess of (J.Ntest for known bounds was proved in [12]. Hence the theorem. 

I 

In practice the variables of tlu* linear diophantine equation lake only finite values 
Ix'cause the right hand .side of tin* equation is a finite value. So if a solution exists 
th<*n even thmigh bounds are unknown, eventually the algorithm finds one of the 
finite solution points, 

3.4.1 The worst case 

(’onsi<ler th<’ following example. 

Linear diophantine equation 2x + 4i/ .+ ••• = 25 
and bounds of the region are 


~oo < X < +00 

■"5C' < y < +00 


'I'he in th<’ eqtiatitm repre.sent some more terms whose variables are of no 
interest . Assume that there is no integer solution. And a pseudo random number 
generator generated values that resulted in dividing always the variable x. Then 
the loft subregion never terminates theoretically. But in practice, as the maximum 
aiul minimum of a number in any computer is fixed, in the w'orst case it will go 
up to that boundary. After reaching the boundary value it is no more possible to 
divide that variable so it continues with the next variable. Eventually the algorithm 
t<>rminates. 'I’his is only a fictitious example. Often if more than two bounds are 
unknown and ranges from -oo to oo then integer solution exists. 
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Example 3,2: I liis t-xainple is same as that given in Section 3.2 except that the 
IujuikIs changed as follows. 


1 < A’j < !/, 

h < A'a < 10 

Lower hound t)f .Vj and upper hound of A '2 are unknown. When Banerjce’s test is ap- 
pii<>d after assuming MIX J XT for lower bound of the variable and MAX.INT 
h>r upjx'r hound of the variable Aj Ihis gives both bounds of the expression 3A^] + 
h.Xi are unknown. flA find a .semi integer solution (1.0, 1.4) satisfying the exist- 
ing ronstraints. Now the region is divided into two subregions {[l,ui], [/ 2 , 1]} and 
1(1 . uj), [2. 10] }. .At this pt)int of time for the left subregion two bounds, equal to 
the nunilMT of unknown hounds in the original problem, are unknown and for the 
right suhr«*gion only on<* hound is unknown. The test is applied recursively. This is 
eontinued up to the depth of 6 levels where solution (5, -1) is found. 

I 

If we compare the example in Section 3.2 with this example, even though the the 
later example seems to he searching in the region {[1 , M 1 N J NT], [MAXJNT, 10]}, 
which is many tliousands of times larger than the region of the former example, it 
t<»ok only ahout 2 tinu's the time of the former example. In terms of number of 
invocations of (LA, for the known hounds example it took 8 invocations where as for 
the unknown lionnds example it took 22. 


3.5 Testing for dependence for a given direction 
vector 

Home restructuring methods need more information than is provided by simple GAt- 
esf to d<‘t ermine whether a pair of references causes the dependence. One of the 
important such information is a dependence direction vector. GAtest in its simplest 
form cannot generate information regarding dependence direction. However, This 
test may be called once for each of the possible combination of elements in a direc- 
tion v(*ctor. The test have to be applied 3” times to determine possible direction 
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v<Mloj>. if llirri- arr ti In this section we develop a method for finding 

(i.'i*. iidnice (iiu-t turn hy inudifying constraints as well as the diophantinc equation. 
Hi'lii ii-i t .nijMil.ii .tmi 1 1 .i|tc, oid.il spaces ar<‘ considi'red. 

( t \v«» st at eincii!'- ale wit hiii a nest of loops of depth n. Dependence problem 
have to lie solved for the diieetioii vector S ■= (.S], .^ 2 . . . . , 

3.5.1 Rectangular scwch space 

I he set of oiininai tonstiaints for rectangular search spare is, 

/i S A../i 5 w, for 1 < / < n. 


• If.s, ' 

/, 'riiereforr- one of the /, and J, can be substituted by the other in 

the diuphantine <‘quation. This results in elimination of one variable. The 
lunstraints lio\vev<T will not change. 

• If .s, 

In this <H.se, .search is to he confined to the space such that li > Ji. Thus the 
(onstraiiits for /, am! ./, should he changed to 

I, < J, < Ui - 1 

-I 1 < /. < 


• if .s. 

In this case the target search should be such that /,■ < Ji- That is, the 
constraints for /, and J, should be changed to 

li< li <Ui-l 

Ji + I < Ji < Ui 
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3.5.2 Trapezoidal search space 

Original constraints for both 1 and J are similar to the equation ( 5). Changing 
the constraints in this case is more complicated compared to the rectangular spaces. 
There are two different sets of constraints for / and J. Additional constraints, like 
li = Ji or li < Ji or li > Ji, have to be imposed in order to satisfy the given 
direction vector. 

Consider the following example, one variable x-k having two constraints. 

S{XuX2,...,Xk-i) < Xk < 9iXi,X2,...,Xk-l) ( 6 ) 

fiVk) < Xk < g'ivk) (7) 

These constraints can be represented on a real line (we are interested for only 
the integer points of the line) that x* takes. 


t(x„x,...,y 


g(x.,x 





Range of x ^ifter combining 


g’{yj 


both constraints 


Figure 3: Variable having two sets of constraints 


From the above hgure actual range of values that Xk takes changes depending on 
the relative positions of points /, 9 , g' . Sometimes if there is no overlapping occurs 
between the intervals [f,g] and [f'-,g'] set of values that Xk can take is NULL. i.e. 
space doesn’t exist. Depending on the relative positions of the points, the lower and 
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tlif npprr lMniii«l> ui t lit’ can hr atljustrcl. In tljc above figure 3 new constraints 
on .fi are }m>vi<lc«l the following in<'(]nalities; 


/(•'■i.J'.- J-k i) < .r/i < g'iVk) 

Mat hfiiiit' i< all\ tins » an he cxpioM'tl as 

iuaxi/(.ri,.r.. it./'t.Vil) < J'k < nnn(<;(T,,T 2 ,- ■ • ,3'ji.-i),5'(J/a))- (8) 

(’onsitiering the jirobb-in <»f trajK’zoi<ial region, 

• If .s ' 

/, One tif t he /, ami ./, can he substituted for the other in the diophantine 

etpjation ami in theothei constraints, the resulting problem has one variable 
le.ss than that uf the original problem. 

• if .s. .'v'; 

In this ease additional constraint /,■ > has to be added. In the other form, 

i 1 /, < M,o -+ U,iJi + ^{2^2 + • • • + Wa_l Ji-l 

has t»i be adiied. 'i lie original and this new constraint can be combined to- 
getbej into one singh- constraint using equation ( 8). 

niax{./, \ l,/,(i I /,|/i + f ^ A 

< niin(n.u 4 ^hiJi + ^12^2 4 + 

U,Q + Wil /l + Ui2h 4- ■ - • + ’Uii-lU-l) 

. If .s, 

In this <as<- additional constraint J, < J. has to be added. Similar to the above 
case the m*\v constraint is, 

A + 1 < -Ii < W,0 + Utl^l d + Uii-lf,-!. 
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Onj’iii'*! «‘n«l lu w n.nslr.iinJs < iui hr umihiiird to give the following single 

i rain! : 

* hiJi -i • • • -f < Jx 

• Hiiuiii,,, • n.Ji t + • • • + u.,_i7._i, 

n„, • n,i./i "t ■+■•• + 

3.6 Solution set enumeration 

Suinrtinws it is niHossary to «*numriatr all solutions. For example to find the distance 
vector. Most of till- tests do not provide any information regarding the solution set 
or dist.tnee vet for. Some rest met uring techniques need the distance vector too, e.g. 
(\»nversit>n of .setiuential do loop to do across loop. 

Fimiing distance vector in ease of (lAtesl is done through enumerating the solu- 
tion set. Tlie search space is divided into three regions. Assuming that the current 
solutitm is (xi. x„) ami the rurrent search space is 


J-l < 


< 3-1 

r -i < 

h 

CH 

H 

VI 

k 5 t 

h 

S. 


the three regions of s<*ar<h space are: 


43 



Original and new constraints can be combined to give the following single 
constraint: 


max(/, + 1, Uo + /j] </] + ••• + ^ Ji 

< min(w,o + w,i/i + Ui^h + • • • + 

fJ-iO + + ^t2<^2 + • • • + Ji_i) 


3.6 Solution set enumeration 

Sonu't iincs it is necessary to enumerate all solutions. For example to find the distance 
vector. Most of the tests do not provide any information regarding the solution set 
or distance vector. Some restructuring techniques need the distance vector too, e.g. 
Conversion of sequential do loop to do across loop. 

Finding distance vector in case of GAtest is done through enumerating the solu- 
tion set. The search space is divided into three regions. Assuming that the current 
solution is (xi,X2, . . . , Xn) and the current search space is 


Xi < 

h 

^ X\ 

VI 

H 

h 

^ X2 


h 

< 


the three regions of search space are: 
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3-1 

< 

/l 

VI 

3^1 < 


J'2 

< 

h 

< ^2 

X2 < 

h 

k 

< 

h 

1 

H 

VI 

Xk + l < 

h 

Wi 

< 

4+1 

< WA-+1 

VI 

+ 

4+1 


< 3^1 

X\ ^ 

4 

< Xi 

< 3:2 

VI 

H 

h 

< X2 

and 




VI 

VI 

H 

h 

< 2:fc 

< UA-+1 

h+i < 

4+1 

< Wfc+1 


'Fhc problem of solution enumeration through GAtest is also addressed in [12]. 
'rh<‘ original search space was divided into two subregions instead of three. Suppose 
th<' solution found is (tj,X 2 ,X 3 , . . . ,a:„) and first divisible variable is Ik, then the 
two subregions according to [12] are, 


3^1 

< 

4 

< 

Xi < 

4 

< ij 

2-2 

< 

4 

< 12 

X2 < 

h 

< 3:2 

4 

< 

4 

1 

H 

VI 

and 

3:^ + 1 < 

4 

< Uk 

4+1 

< 

4+1 

< ^^fc+i 

VI 

+ 

Jd 

4+1 

< l^fc+1 


GAtest will now be applied recursively on the two subregions. This does not generate 
the entire solution set as indicated by a simple example below. 

Example 3.3: Let the diophantine equation be, 

2x 3y — 4z = 12 
and the bounds of the variables be 

1 < X <10 

1 < y <10 

1 < z <10 

Suppose at some point of time we got an integer solution (6, 4, 3). Now according 
to [12] division of region would result in two subregions 

l<x<5 7<x<10 

1 < y <10 and 1 < y <10 

1 < z < 10 1 < z < 10 
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After applying GAtcst recursively some more possible solutions may be obtained. 
However, if one observes carefully there is a solution (6, 8, 6) which does not belong 
to any of the two subregions. The reason is attributed to the fact that the specific 
valu(' of the divisible variable which determined the partition is neglected. It is 
possible, for example, a solution, may exist where divisible variable has the same 
valiu' in some other solutions for a different combination of values for the remaining 
variables. 

I 

But there is a problem with the technique of division of search space into three 
subregions. It may be possible that multiple instances of the same solutions is found 
n'peatedly'. In the third region the divisible variable has only one value. It can be 
substituted in the diophantine equation and can be eliminated. So total number of 
repetitions are limited to the number of variables. 

In the following algorithm divisible variable means, it must be 

• having more than one point, 

• having no other variables in bound expressions. 
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Algorithm 3.3: For enumeration of solution set Line 4 of Algorithm 2.1. should 

\>v rc'plaa'd by 


> 

> 

> 

> 

> 

> 

> 

> 

> 

> 

> 

> 

> 

> 

> 

> 

> 

> 

> 


if (all xi are integers) 

Solutionset = Solutionset + x; 
j = select first divisible variable; 

if (divisible variable found) { 
tl = 1; 
tu = u; 
tlj= xj+1; 
tuj= xj-1; 

GAtestd, tu, a); 

GAtest(tl, u, a); 

Propagate first divisible variable value (in solution) . 
Form Resulting constraints ptl, ptu and 
diophantine equation pa. 

(p indicates values after divisible variable is 
Propagated) 

GAtest(ptl, ptu, pa); 

} 

> 

else 


I 


3.7 S'^-spaces 

In most of the do loops, loop index gets incremented by some constant value for each 
iteration. If the loop increment is non unit the region described by the constraint 
equations will have some, invalid integer points. Loop control never goes through 
these invalid points. For example. 
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DO i=l, 20, 2 
DO j=l. 20, 2 
A(i, j) = . . . 
ENDDO 
ENDDO 

llu' loop constraints are, 


DO i=l, 10 
DO j=l, 10 

A(2*i, 2*j) = . . . 
ENDDO 
ENDDO 


1 < J < 20, 

1 < ; < 20 . 

'I'Ih* above loop execution does not goes through points (2, 2), (2, 4), . . . , though 
th{“ <‘onstraint s show the existence of these integer points. Such loops with non-unit 
increments can be easily normalized to unit increment. 

I lowever , Some while loops, the general form of for loops in C programs usually 
have a recurrence relation of the index variable. This type of loops have many invalid 
points between the lower and upper bounds. Moreover these cannot be normalized 
to have any constant increment. Such type of search spaces, with a few valid points 
are called ’sparse search spaces’ or 5^-space. 


3.7.1 Representation of 5^-spaces 

In general, the recurrence relation of the index can be a polynomial expression. 
Lower and upper bounds need not be simple constants, they can be of trapezoidal 
form or triangular form. Let us assume that there are n nested for-loops with 

index variables /j,/ 2 , For simplicity assume that each loop’s lower and upper 

bounds are linear expressions of index variables of outer loops(trapezoidal form) and 
recurrence relations are polynomial expressions of that index. Then the search space 
can represented as follows, 
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^00 < h < Woo and = ri(/i), 

^o + /n^i< -^2 < W]o + Uii/n and /2 = r2(72), 

4-10 + h In-ln-lln-l < 7n < W„_io d f- 

and 7„ = r„(7„) 


( 9 ) 


whciT J-, is the recurrence relation of the index of loop. 


3.8 GAtest for 5^-spaces 

None of the currently known test is applicable for S'^-spaces. Only GAtest by its 
nature of testing at points and dividing the region, can be adapted for S^-spaces. 
Given the lower and upper bounds (/ and u), recurrence relation (r) of a loop 
variable, define three functions on a number V between lower and upper bounds, 

1. (a <=) = b such that 6 < a, r(6) > a and b is in the sequence of numbers 
generated by the recurrence relation r. 

2. [a >)= b such that b = r(a <) if 6 < u. Undefined otherwise. 

3. (a <) = 6 such that r(6) > a, b < a, ii b > I and b is in the sequence of 
numbers generated by the recurrence relation r. Undefined otherwise. 

3.8.1 Distance between two numbers of a sequence 

If a and b are any two numbers selected from the sequence of numbers generated by 
the recurrence relation r, number of valid points in search space in between a and b 
including b is the distance between a and b. We will denote the distance between a 
and b as dist(a, b). Similar to the distance between two solutions in case of normal 
search spaces, this gives the distance in case of 5^-spaces. 
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3.8.2 Dependence problem in 5^-spaces 

Lot tliK' soaroh spaco is same as that described in Section 3.7.1. The diophantine 
to bo solved is ( 2) The problem is to search for some integer solution to 
the (‘cjuation ( 2). in the search space ( 9). 

3.8.3 Finding integer solution 

(■Constraints are trapezoidal so by using Banerjee’s test we can find out the lower 
and upper bounds of the expression ayli + 02/2 + • • •■ If Go is in between the limits 
then Banerjec’s test says there exists solution to the diophantine equation. 

If Banerjee’s test fails to rule out the dependence GAtest will be applied. The 
trapezoidal bounds are changed to bounds of a circumscribed rectangular space. 
Semi-integer solution is found using the following steps. 

Finding semi-integer solution Given the rectangular bounds and recurrence 
relations, GAtest first initializes the population. 

for (j=0; j<populationsize; j++) 

for (i=0; i<no_of .variables ; i++) 

{ 

rnum = random_nuittber_in_the_raJige (lower [i] , upper [i]) 
member [j] .var[i] = (rnum <=) 

} 

} 

It selects a member, and apply the genetic operations like mutation. Then the 
goodness of fitness for the resulting member is evaluated. The new member is 
included in the population by replacing the worst fit member. 

member = select.memberO 

rnum = randdom_number_in_the_raiige(l , number_of_variables) 
for all i except i = rnum find sum of ai * member .var[i] 
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member .varCi] = validated_vax_value((aO - sum)/a rnum) 
if (have_enough_fitness (member)) include (member) 

'i'lu' validation of a variable is done through checking the value to be within the 
s(‘cjucnc(’ of numbers generated by the corresponding recurrence relation. The value 
is first checked against the bounds and if it is within the bounds the (<=) operation 
is applied. Otherw'ise it is appropriately assigned to lower or upper bounds. If the 
value before validation is within the bounds then the solution is semi-integer solution. 
Otherwise the genetic operators are applied repeatedly until a semi-integer solution 
is found. 

If the solution found is satisfying all the rectangular bounds then it must be 
checked against the original trapezoidal constraints. If the solution found is a semi- 
integer solution then it must be divided into two subregions. In order to divide the 
region, a variable satisfying the following constraints, is selected. 

• Its lower and upper bounds expressions should not contain any other variable. 

• dist{lbi,ubi) > 1. 

Similar to trapezoidal regions, if lower and upper bounds of a variable are equal 
then that variable can be eliminated by propagating the value of that variable. 

CAtest is applied recursively to the divided search spaces. This process is con- 
tinued until an integer solution is found or all subspaces have no solutions. Detailed 
algorithm is given below. 
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Algorithm 3.4: 

BOOL GATest(ltb, utb, diequ) 

{ 

if (BaoierjeeTest (Itb, utb, diequ) == TRUE) { 

ConvertToRectangularBounds(ltb, utb, Irb, urb) 

SemilntSol = FindSemilntSolClrb, urb, diequ) 
if (ValidlntSOl(SemilntSol)) 
return (TRUE) 
else { 

DivVar = SelectDivisibleVarO 

DevideSearchSpaceCltb, utb, Itbl , utbl, ltb2, utb2, DivVar, 

Solution, diequ) 

Result = GATestdtbl, utbl, diequ) 
if (Result == FALSE) 

Result = GATest(ltb2, utb2, diequ) 

return (Result ) 

} 

> 

else return (FALSE) 
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Solution FindSemilntSolClrb, urb, diequ) 

< 

InitPopulationO 

while (Semi IntegerSolut ion not found) -( 

ApplyGenet icOperators ( ) 

Member = SelectMemberO 

VarNum = RandomValueBetween(l , NoOf Variables) 

Sum = For all i except i = VarNum 

CoefOf(i, diequ) * Member .Vars [i] 

TempVal = (ConstantTerm (diequ) - Sum) / Coef Of (VarNum, diequ) 
if (TempVal is in between lower and upper bounds of VarNum) { 
Member . V ars [V arNum] = T empV al 
Solution = Member. Vars 
return(Solution) 

} 

if (TempVal < lower bound) 

Member. Vars [VarNum] = lower bound 
else (TempVal > upper bound) 

Member .Vars [VarNum] = upper bound 

Insert IntoPopulat ion (Member) 

} 

} 
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DevideSearchSpaceCltb. utb, Itbl, utbl, ltb2, utb2, DivVar, Solution, diequ) 

{ 

Itbl = Itb 
utbl = utb 

ConstantTerin(utbl) = (Solution[DivVar] <=) 

If (Itbl [DivVar] == utbl [DivVar] ) 

Propagate(DivVar, diequ, Itbl, utbl) 

ltb2 = Itb 
utbl = utb 

ConstantTerm(ltb2) = (Solution[DivVar] >) 

If (ltb2 [DivVar] == utb2 [DivVar] ) 

Propagate (DivVar, diequ, ltb2, utb2) 

} 

I 

Example 3.4; Consider the example having bounds 

1 /j ^ 100 and /j = 2 * -P 2 
1 -|- ^ I 2 ^ 100 and /2 = /2 ■h 1 

and the diophantine equation is 


1\ + 2/2 — 25 

GAtest found a solution (1,12) within one invocation of the algorithm. Division of 
the region was not needed at all. 

I 

Similar to the normal trapezoidal search spaces in case of 5^-spaces also it is 
possible to enumerate the solution set by changing some portion of the algorithm. 
So it is possible to find out the dependence distance. 

This idea can be extended to solve the non-linear equations too. In case of non 
linear equations if it is possible to find out the minimum and maximum values using 
some cost effective methods, if exists at all, then Banerjee’s test can be substituted 
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witfi those nu’lhods. Hut this test is limited by the fact that in GA it must be 
possible for any variable to find out its value given values of remaining (71 — 1) 
variables. In practice most of the array references are linear and the recurrence 
r<‘iations and lower and upper bound expressions are also linear. 


3.9 Performance of GAtest 


For the rectangular and trapezoidal bounds (normal search spaces) it was found in 
[12] cost of GAtest is 


C{GA{l,u)) 


C{B) if Banerjee’s test rules out the dependence, 

< C{B) + C{GAsemi - integer{l,u)) + C{GA{l,tu)) + C{GA{tl,u)) 
otherwise, 


where (/,u) is bound of the search space, tl and tu are new bounds of the divided 
search space. In the worst case it is 0{N * c), where N is the size of search space 
and c is some constant mostly dependent on time required for finding semi-integer 
solution. 

In case of ^^-search spaces also as the basic algorithm is same, constituents of the 
cost will be same, but the time required to find the semi-integer solution increases 
due the fact that every random value being assigned to a parameter of population 
must be checked or aligned to nearest valid point in the search space. 

Following table compares the performances of GA and omega tests for the de- 
pendence problem of unknown bounds. Each test is applied 100 times in each case 
to get accurate time values. All times are shown in Millie seconds. 
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! (' om parison of (iAt(?st and Omega test 


Piihinini 

a A test 
'I'iinc for 100 
(?xt‘<:utions(ins) 

Omega test 
'rime for 100 
executions(ms) 

linprovcinenl of 
GAtest over 
Omega 

si)2i).l 

102 

142 

28.1% 

s|):n.t 

35 

39 

10.2% 

s[) i;}.i 

25 

30 

25.0% 

.sp").! 

35 

40 

12.5% 

spnO.t 

25 

31 

19.3% 

sp51 .t 

35 

50 

30.0% 

sp52.( 

08 

89 

12.4% 


70 

83 

15.7% 

spCif).! 

25 

35 

28.6% 

sp(i(i.t. 

38 

67 

43.3% 

spT.t, 

42 

47 

10.6% 

spTl.t 

34 

42 

19.0% 

sp72.t 

35 

40 

12.5% 

sp77.t 

39 

66 

40.9% 

sp79.t 

41 

59 

30.5% 

sp80.t 

66 

99 

33.3% 
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l''oiU»winu, t.ihh' givt's time requirements of GAtest for 5^-spaces. 


'lable (i: 'riming of GAtest for S^-spaces 


Program 

Time (in ms) 
for 100 executions 

jirobl 

387 

prob2 

167 

prob3 

331 

prob4 

781 
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Chapter 4 

WHILE Loop Parallelization 


Most parallelizing compilers treat WHILE loops and DO loops with conditional 
exits as sequential constructs because their iteration space is unknown. In many 
fortran programs if-then-go type of loops exists in addition to normal WHILE 
constructs. In general the index variable may be a recurrence relation in for loops of 
C programs, in which case also iteration space is unknown. Some of these loops can 
be executed in parallel. Parallelization of WHILE loops is well studied in [5] . This 
work is done based on those techniques presented there in [5]. The implementation 
details are presented in Section 4.4. 

4.1 Problems in parallelizing WHILE loops 

In the most general form, a WHILE loop consists of the following three parts 

1. one or more recurrences that can be detected at compile time, 

2. a remainder whose dependence structure can be either analyzed statically or 
is unknown at compile time, 

3. one or more termination conditions. 

Assuming that there are no cross-iteration data dependences in the remainder, there 
are two potential problems in the parallelization of WHILE loops; 
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Initialize Dispatcher 


while 


Not termination condition 


) 


Do work() 


Dispatcher = next dispatcher() 

cnddowliile 

Figure 4: WHILE loop 


• Evaluating the recurrences: 

If the recurrences cannot be evaluated in parallel, then the iterations must be 
started sequentially after evaluating the recurrences for each iteration, leading 
in the best case, to a pipelined execution. 

• Evaluating the terminating conditions: 

If the termination condition cannot be evaluated independently by all iter- 
ations, the parallelized WHILE loop could continue to execute beyond the 
})oint where the original sequential loop would stop, i.e. it can overshot. 

Data dependence analysis itself is a very complex problem in W^HILE loop 
parallelization. Because iteration space is unknown, hence, it is difficult to perform 
dependence analysis at compile time. In [6] a run time technique called Parallelizing 
DO ALL test is proposed to analyze the data dependence problem with complex 
equations. I'his can be used for the dependence problem of WHILE loops whose 
iteration space is unknown. GAtest can handle some of the dependence problems of 
WHILE loops at compile time. This is presented in section 4.4.1. 
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lahli* 7: laxononiy of WHILE loops and their dispatdier’s potential for parallel 


Loop 

t( rmiiiatoi 

Dispatcher 


Monoionic 

NonMonotonic 

Associative 

General 


Induction 

Induction 

Recurrence 

Recurrence 


Over 

t^nralld 

Over 

Parallel 

Over 

Parallel 

Over 

Parallel 


shoot 


shoot 


shoot 


shoot 


RI 

NO 

^yes 

YES 

YES 

NO 

YES-PP 

NO 

NO 

IW 

YES 

Lxes 

YES 

YES 

YES 

YES-PP 

YES 

NO 


4.2 Recurrence relations of a WHILE loop 

A WHILE loop can have several dependent or independent recurrences. The dom- 
inating recurrence which precedes the rest of the computation in the dependence 
graph is called the dispatching recurrence or simply dispatcher [5] The dispatcher 
can be a simple induction or an associative recurrence relation. The induction 
dispatcher can be calculated independently using the closed form solution of the 
induction. The associative recurrence relation can be parallelized by parallel prefix 
calculation techniques. Sometimes dispatcher must be calculated sequentially, e.g. 
pointer used to traverse a linked list. 

The terminator can be remainder invariant (RI), it is only dependent on the 
dispatcher and values that are computed outside the loop, or remainder variant 
(RV), i.e., dependent on some value computed inside the loop. If terminator is RV, 
then iterations larger than the last valid iteration could be executed in a parallel 
execution of the loop. The overshooting may also occur if the dispatcher is an 
induction or an associative recurrence and the terminator is RI. Depending upon 
the terminator type and dispatcher type WHILE loops can be classified as shown 
in Table 7. 

Various techniques like strip mining [7] and loop distribution [5] and several op- 
timizations for the distributed loops are proposed to parallelize the WHILE loops. 
Essentially techniques proposed in [5] work by distributing the original WHILE 
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loop into th<' two following loops: 

1 . A loop iiial ovaluales the terms of the dispatcher and any termination condi- 
tion that is strongly connected to the dispatcher. 

2. A loop consists of the remainder loop and its associated termination condition. 

1-or the optimizations on these distributed loops, for each type of dispatcher in 
'ral)le{ 7), reader can refer to [5]. 

4.3 Undoing the iterations that overshoot the ter- 
mination condition 

The easiest way to undoing is to check pointing before loop execution starting and 
maintaining the record when a memory location is updated. When the loop is 
terminated, all the iterations below the iteration that caused termination will be 
copied back to original data. But this needs three times more memory than the 
actual memory necessary for data: (1) for check pointing, (2) keeping record of 
iteration number when it is modified and (3) original copy of data. This can be 
reduced if the data being updated by the WHILE loop is sparse by keeping only 
copies of those elements that were modified along with the time stamps. For this a 
liash table can be used. Thus total check pointing can be avoided. 

A simple way to reduce the memory requirements is to strip mine [7] the loop, 
i.e., first execute s iterations then next s iterations and so on for a suitable value 
of s. In this case memory needed is bounded by the product of s and number of 
elements that were written in 1 iteration. 

Time stamping can be avoided if the WHILE loop is run with check pointed 
dat a first and after determining the number of iterations, the loop can be executed 
simply as a DO ALL on original data. 
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4.4 Implementation 

Tlu* program that is accepted by this module consists of declarations if-then-else 
staleiiieiits , WHILE loops (not nested) and assignment statements. This module 
works in three phases. 

• Phase I: Parsing 

In Phase I the parser parses the program is and constructs a syntax tree of 
the program. To generate the parser compiler construction tools lex and yacc 
are used. 

• Phase 11: Dependence analysis 

In Phase II, a WHILE loop is identified and dispatcher is found. Initial value 
of dispatcher (reaching definition of dispatcher variable to WHILE loop) is 
found. It must be some known constant for the dependence problem to be 
analyzed at compile time. In some special cases like the dispatcher being 
incremented by 1, even though the dispatcher’s initial value is not known, the 
dependence problem can be analyzed at compile time. 

• Phase III: Code generation 

Intermediate code contains DO ALL statements in addition to that of sour<-e 
language. Parallelized distributed loops will be generated. 
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A iia]yj:e.dopcndences( ) 

{ 

for (all pairs of statements (sl,s2) inside the while) { 

connnon = (def(sl) ft dcf(s2)) U (def(sl) fl use(s2)) U (use(sl) fl def(s2)); 
if (common != <j)) { 

for (each possible dependence) { 
form dependence equations; 
call G A test; 
if (dependence exists) 

return (dependence exists) 

} 

} 

} 

return (no dependence) 

} 

I 
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Example 4.1: Distrihutioii of a WHILE loop. 



I 

4.4.1 Cross- iteration dependence testing of WHILE loops 

Unlike the dependence problem of do loops the dependence problem of WHILE 
loops contains recurrence relations in addition to the constraints. The solution 
suggested in Section 3.8 can be used for the dependence problem. In order to apply 
the GAtest the subscript expressions must be linear and the lower bound of the 
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variables must bo known. 

Soniot lines it is possible to execute the WHILE loop as a WHILE ACROSS 
(analogy to do-across) if there exist loop carried dependence and the dependence 
distance is of suitable size. If all the bounds of the variables are known, then the 
distance vector can be found using the GAtest. 

In some cases of WHILE loops, it is not possible to analyze the dependence 
problem at compile time. This happens if subscripted-subscripts are used or sub- 
script expressions are non-linear. In such cases execution of WHILE loop can be 
started in parallel. If the dependence is found at run time, it can be stopped and 
the loop will be executed sequentially. 

If it is determined at compile time that there is no flow dependences and the 
only dependences are anti dependences or output dependences, then the loop can 
be executed in parallel by privatizing the variables that are accessed. A variable 
can be privatized if and only if all read accesses are preceded by write access to that 
variable. Even if it is not possible to analyze the dependence problem at compile 
time a WHILE loop can be parallelized using run time dependence checking called 
privatizing DO ALL test (PDtest) [6]. 

Except for the dispatcher of general case (dispatcher of linked list traversal pro- 
gram), GAtest is applicable in many cases. Even though the PDtest can be done in 
parallel, GAtest outweighs the former due to the cumulative time it consumes dur- 
ing many executions of the programs. Due to technical reasons author is unable to 
present the bench marking of GAtest for WHILE loops against known techniques. 

4.5 Errors in parallel execution of WHILE loop 

There are two types of errors that could occur during the speculative parallel ex- 
ecution of the WHILE loop: (i) exceptions (ii) the presence of cross iteration 
dependences in the loop. A simple way to deal with the exceptions is to stop the 
parallel execution, restore the previous values and execute the WHILE loop sequen- 
tially. In order to detect the cross iteration dependence in the parallel execution of 
WHILE loop, PDtest can be used. The PDtest is applied to each shared variable 
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that is ac<’<'sso<i during the loop execution but whose accesses cannot be analyzed 
at compile t ime. 
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Chapter 5 


Conclusions 


Dependence problem is such an important part of a restructuring compiler which 
attracted the attention of many researchers. Many test were proposed. Most of the 
simple tests are conservative in nature where as accurate tests are complex to be 
used. From the study of some of the programs it was observed that in many cases 
a simple test like Banerjee’s test is enough to solve the dependence problem. But 
the only problem comes from the analysis of subroutines which may contain formal 
variables as the loop limits. Singhai [12] suggested a suite of tests that can be used 
to handle all the cases. Based on the characteristics of the problem some of the test 
from the suite can be used. 

GAtest is extended such that in almost all the cases it can be used. It requires 
less time compared to other exact tests. GAtest can provide information like de- 
pendence direction, dependence distance for a program restructurer. Only GAtest 
can provide information for optimal parallelizing of a program, i.e. by its capability 
of enumeration of dependent iterations pairs, it can be used to extract fine- grain 
parallelism. 

Frequently observed unknown bounds problem in subroutines, can be handled by 
GAtest more efficiently than some other tests that can work with unknown bounds 
as shown in Table( 3.9). 

Many compilers treat the WHILE loops as sequential loops. In many cases these 
are parallelizable. Usually all such loops contains induction recurrence relation or 
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<>ss<)ciativc nrurrcnce relation. Currently there is no coinpilc tinu' t<?chni(iuo l.hat 
can analyze the dependence problem involving recurrence relation. ClAtest test is 
('.Ktt'iided to liaiKlle this problem. Performance results proved that it can b(' included 
in a practical program restructurer. Rauchwerger and Padua [5] suggested a run time 
(lep(UKlcnce analysis method. That works with any type of complex references. Hut 
in common cases it is unnecessary to postpone the dependence analysis probh'in up 
to run time which incurs overhead in each execution of the program. If the upi)cr 
hound of the recurrence relation is also known then the CAtest can be usc'd to 
expkht fine-grain parallelism in the case of while looi)s also h'ading to ititroduction 
of WHILE ACROSS loops in analogy to DO ACROSS loops. 'This is also lu'lpful 
in detonnining the size of ’strip’ in strip mining of the loop. 

CAtest for unknown bounds can be extended to have symbolic constants in linear 
diophantine equation also. These constants can be treated as special variables whose 
value may be dependent on some loop limit of the region or simply a free variable. 

CAtest can be extended to handle some non-linear equations too. I'here are two 
probh'ins to extend this. These are: 

• finding the bounds of the non-linear equation in the given search space, 

• (inding the value of a variable from the non-linear equation given the values 
for the remaining variables. 

CAtest is limited by the second problem. After substitution of values of all variables 
except one variable in the non-linear equation, it will result in a polynomial. It is 
not j)rartical to solve a polynomial of order more than 2. In f)ractice such ease's arise 
rarely. CAtest as it is now, is found to be more general and simple exact test. 


68 



Bibliography 


[1] C.D.Polychroiiopolous. Parallel ■programming and compilers. Kluwer acadctnic 
publishers, 1988. 

[2] (>ina GofF, Ken Kennedy, Chau- Wen Tseng. Practical dependence testing. Pro- 
ceedings of the ACM SlGPLAN-91 conference on programming language design 
and implementation, Toranto, Ontario, Canada, June-1991. 

[3] ll.R.Sudheer. GAtest: An exact dependence test for loop parallelization. Mas- 
ter’s thesis, I.I.T., Kanpur, 1993. 

[J] Jose L.llibeiro Filho, Philip C.Treleven, Cesare Alippi Genetic algorithm pro- 
gramming environments. IEEE Computer, Junc-1994. 

(f)] Lawrence Rauchwerger, David Padua. Parallelizing WHILE loops for multi pro- 
cessor systems. Center for super computing research and development. Univer- 
sity of Illinois at Urbana Champaign, 1994. 

[G] Lawrence Rauchwerger, David Padua. The privatizing DO ALL test: A run 
time techriique for DOALL loop identification and array privatization. Center 
for super computing research and development. University of Illinois at Urbana 
Champaign, 1994. 

[7] Michael Wolfe. Optimizing supercompilers for supercomputers. 'I'he Ml'l' press 
Cambridge, 1989. 

[8] Michael Wolfe, Chau- Wen Tseng. The power test for data dependence. IEEE 
transactions on parallel and distributed systems, Vol. 3, Sept- 1992. 


69 



(J)] 


(!{)] 

(11] 

[ 12 ] 
[13] 
[H] 
[15] 



IMi<Iiad Wolfe. The Tiny loop i-eslructuring rcftcmvh tool. In Proceed in gs of 
niOl International Conference on Parallel Processing, 1991. 


IM.Srinivas, Lalit M.Patnaik. 
June- 1994 


Genetic algorithms a survey. lI-lIilE Compnler, 


P.Suresh. On classifying programs for data dependence tests in parallelizing 
compilers, A tedinical report, Dept, of Comuter Science, I.I.T. Kanpur, 1995. 

S.Singhai. Data dependence tests for loop parallelization. Master’s thesis, I.I.T., 
Kanpnr, 1995. 

U(.i>al Bancrjce. Dependence analysis for supercomputing. Kluwer acadetnic 
publishers, London, 1988. 

William Pugh. A practical algorithm for exact array dependence analysis. Com- 
munications of ACM, Vol. 35, Aug-1992. 

Xiangyun Kong, David Klappholz, Kleanthis Psarris. llie I test:A new test 
for subscript data dependence. International conference on parallel processing, 
1990. 


/hiyuan Li, Pen-Chung Yew, Chuan-Qi Zhu. An efficient data dependence anal- 
ysis for parallelizing compuilers. IEEE Transactions on parallel and distributed 
systems, Vol. 1, Jan-1990. 


70 





